126 research outputs found
Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell Code
iPIC3D is a widely used massively parallel Particle-in-Cell code for the
simulation of space plasmas. However, its current implementation does not
support execution on multiple GPUs. In this paper, we describe the porting of
iPIC3D particle mover to GPUs and the optimization steps to increase the
performance and parallel scaling on multiple GPUs. We analyze the strong
scaling of the mover on two GPU clusters and evaluate its performance and
acceleration. The optimized GPU version which uses pinned memory and
asynchronous data prefetching outperform their corresponding CPU versions by
5-10x on two different systems equipped with NVIDIA K80 and V100 GPUs.Comment: Accepted for publication in ICCS 201
A body at the edge of language: writing anorexia, bulimia and recovering
This practice-led life writing project explores this writer-scholar's experience of her eating disorder through a series of poetic essays developed from material and somatic writing methods including ink-and-paper, found text, and movement. Through these particular methods, and the episodic acts of the writing itself, this PhD discovers a form of somatic life writing that both demonstrates and analyses the lived experience of this psycho-somatic disorder. This research project responds to the challenges of writing anorexia, bulimia and recovering, by developing material writing methods to negotiate self-erasure, narrative authority and embodied memory on the page. The PhD examines the symbiotic relation between writing and (not) eating in ways that are analogous, metaphoric and mutually affective. It draws on a range of writers and feminist materialist scholars to propose that when the tensions of eating disorder are transposed to language and navigated on the page, moments can be found where bodies and writing are constituted and de-constituted. In locating their life-affirming entanglement, this writing practice counteracts the erasure and containment of the condition
NVIDIA Tensor Core Programmability, Performance & Precision
The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called
"Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices
per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta
microarchitecture, provides 640 Tensor Cores with a theoretical peak
performance of 125 Tflops/s in mixed precision. In this paper, we investigate
current approaches to program NVIDIA Tensor Cores, their performances and the
precision loss due to computation in mixed precision.
Currently, NVIDIA provides three different ways of programming
matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply
Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS
GEMM. After experimenting with different approaches, we found that NVIDIA
Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100
GPU, seven and three times the performance in single and half precision
respectively. A WMMA implementation of batched GEMM reaches a performance of 4
Tflops/s. While precision loss due to matrix multiplication with half precision
input might be critical in many HPC applications, it can be considerably
reduced at the cost of increased computation. Our results indicate that HPC
applications using matrix multiplications can strongly benefit from using of
NVIDIA Tensor Cores.Comment: This paper has been accepted by the Eighth International Workshop on
Accelerators and Hybrid Exascale Systems (AsHES) 201
TensorFlow Doing HPC
TensorFlow is a popular emerging open-source programming framework supporting
the execution of distributed applications on heterogeneous hardware. While
TensorFlow has been initially designed for developing Machine Learning (ML)
applications, in fact TensorFlow aims at supporting the development of a much
broader range of application kinds that are outside the ML domain and can
possibly include HPC applications. However, very few experiments have been
conducted to evaluate TensorFlow performance when running HPC workloads on
supercomputers. This work addresses this lack by designing four traditional HPC
benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG)
solver and Fast Fourier Transform (FFT). We analyze their performance on two
supercomputers with accelerators and evaluate the potential of TensorFlow for
developing HPC applications. Our tests show that TensorFlow can fully take
advantage of high performance networks and accelerators on supercomputers.
Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical
communication bandwidth on our testing platform. We find an approximately 2x,
1.7x and 1.8x performance improvement when increasing the number of GPUs from
two to four in the matrix-matrix multiply, CG and FFT applications
respectively. All our performance results demonstrate that TensorFlow has high
potential of emerging also as HPC programming framework for heterogeneous
supercomputers.Comment: Accepted for publication at The Ninth International Workshop on
Accelerators and Hybrid Exascale Systems (AsHES'19
Signatures of Secondary Collisionless Magnetic Reconnection Driven by Kink Instability of a Flux Rope
The kinetic features of secondary magnetic reconnection in a single flux rope
undergoing internal kink instability are studied by means of three-dimensional
Particle-in-Cell simulations. Several signatures of secondary magnetic
reconnection are identified in the plane perpendicular to the flux rope: a
quadrupolar electron and ion density structure and a bipolar Hall magnetic
field develop in proximity of the reconnection region. The most intense
electric fields form perpendicularly to the local magnetic field, and a
reconnection electric field is identified in the plane perpendicular to the
flux rope. An electron current develops along the reconnection line in the
opposite direction of the electron current supporting the flux rope magnetic
field structure. Along the reconnection line, several bipolar structures of the
electric field parallel to the magnetic field occur making the magnetic
reconnection region turbulent. The reported signatures of secondary magnetic
reconnection can help to localize magnetic reconnection events in space,
astrophysical and fusion plasmas
Nonlinear evolution of the magnetized Kelvin-Helmholtz instability: from fluid to kinetic modeling
The nonlinear evolution of collisionless plasmas is typically a multi-scale
process where the energy is injected at large, fluid scales and dissipated at
small, kinetic scales. Accurately modelling the global evolution requires to
take into account the main micro-scale physical processes of interest. This is
why comparison of different plasma models is today an imperative task aiming at
understanding cross-scale processes in plasmas. We report here the first
comparative study of the evolution of a magnetized shear flow, through a
variety of different plasma models by using magnetohydrodynamic, Hall-MHD,
two-fluid, hybrid kinetic and full kinetic codes. Kinetic relaxation effects
are discussed to emphasize the need for kinetic equilibriums to study the
dynamics of collisionless plasmas in non trivial configurations. Discrepancies
between models are studied both in the linear and in the nonlinear regime of
the magnetized Kelvin-Helmholtz instability, to highlight the effects of small
scale processes on the nonlinear evolution of collisionless plasmas. We
illustrate how the evolution of a magnetized shear flow depends on the relative
orientation of the fluid vorticity with respect to the magnetic field direction
during the linear evolution when kinetic effects are taken into account. Even
if we found that small scale processes differ between the different models, we
show that the feedback from small, kinetic scales to large, fluid scales is
negligable in the nonlinear regime. This study show that the kinetic modeling
validates the use of a fluid approach at large scales, which encourages the
development and use of fluid codes to study the nonlinear evolution of
magnetized fluid flows, even in the colisionless regime
Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs
Analytic, first-principles performance modeling of distributed-memory
parallel codes is notoriously imprecise. Even for applications with extremely
regular and homogeneous compute-communicate phases, simply adding communication
time to computation time does often not yield a satisfactory prediction of
parallel runtime due to deviations from the expected simple lockstep pattern
caused by system noise, variations in communication time, and inherent load
imbalance. In this paper, we highlight the specific cases of provoked and
spontaneous desynchronization of memory-bound, bulk-synchronous pure MPI and
hybrid MPI+OpenMP programs. Using simple microbenchmarks we observe that
although desynchronization can introduce increased waiting time per process, it
does not necessarily cause lower resource utilization but can lead to an
increase in available bandwidth per core. In case of significant communication
overhead, even natural noise can shove the system into a state of automatic
overlap of communication and computation, improving the overall time to
solution. The saturation point, i.e., the number of processes per memory domain
required to achieve full memory bandwidth, is pivotal in the dynamics of this
process and the emerging stable wave pattern. We also demonstrate how hybrid
MPI-OpenMP programming can prevent desirable desynchronization by eliminating
the bandwidth bottleneck among processes. A Chebyshev filter diagonalization
application is used to demonstrate some of the observed effects in a realistic
setting.Comment: 18 pages, 8 figure
Kinetic simulations of magnetic reconnection in presence of a background O+ population
Particle-in-Cell simulations of magnetic reconnection with an H+ current
sheet and a mixed background plasma of H+ and O+ ions are completed using
physical mass ratios. Four main results are shown. First, the O+ presence
slightly decreases the reconnection rate and the magnetic reconnection
evolution depends mainly on the lighter H+ ion species in the presented
simulations. Second, the Hall magnetic field is characterized by a two-scale
structure in presence of O+ ions: it reaches sharp peak values in a small area
in proximity of the neutral line, and then decreases slowly over a large
region. Third, the two background species initially separate in the outflow
region because H+ and O+ ions are accelerated by different mechanisms occurring
on different time scales and with different strengths. Fourth, the effect of a
guide field on the O+ dynamics is studied: the O+ presence does not change the
reconnected flux and all the characteristic features of guide field magnetic
reconnection are still present. Moreover, the guide field introduces an O+
circulation pattern between separatrices that enhances high O+ density areas
and depletes low O+ density regions in proximity of the reconnection fronts.
The importance and the validity of these results are finally discussed
- …